Implicit radix sorting on GPUs
نویسندگان
چکیده
In this chapter, we present a high performance sorting function on GPUs that is able to exploit the parallel processing power and memory bandwidth of modern GPUs to sort large quantities of data at a very high speed. We revisit the traditional radix sorting framework, analyze the weaknesses, and then propose a solution based on the implicit counting data presentation and its associated operations. We also improve the bandwidth utilization with our hybrid data structure and redefine the concept of arithmetic intensity as a guidance for GPU optimization process.
منابع مشابه
Sorting On A Graphics Processing Unit(GPU)
2.1 Graphics Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.2 Sorting Numbers on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.2.1 SDK Radix Sort Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2.1.1 Step 1–Sorting tiles ...
متن کاملRevisiting Sorting for GPGPU Stream Architectures1
This report presents efficient strategies for sorting large sequences of fixed-length keys (and values) using GPGPU stream processors. Our radix sorting methods demonstrate sorting rates of 482 million key-value pairs per second, and 550 million keys per second (32-bit). Compared to the state-of-the-art, our implementations exhibit speedup of at least 2x for all fully-programmable generations o...
متن کاملSorting on GPUs for large scale datasets: A thorough comparison
Although sort has been extensively studied in many research works, it still remains a challenge in particular if we consider the implications of novel processor technologies such as manycores (i.e. GPUs, Cell/BE, multicore, etc.). In this paper, we compare different algorithms for sorting integers on stream multiprocessors and we discuss their viability on large datasets (such as those managed ...
متن کاملA Hybrid Sorting Algorithm on Heterogeneous Architectures
Nowadays high performance computing devices are more common than ever before. The capacity of main memories becomes very huge; CPUs get more cores and computing units that have greater performance. There are more and more machines get accelerators such as GPUs, too. Take full advantages of modern machines that use heterogeneous architectures to get higher performance solutions is a real challen...
متن کاملFast 4-way parallel radix sorting on GPUs
Efficient sorting is a key requirement for many computer science algorithms. Acceleration of existing techniques as well as developing new sorting approaches is crucial for many realtime graphics scenarios, database systems, and numerical simulations to name just a few. It is one of the most fundamental operations to organize and filter the ever growing massive amounts of data gathered on a dai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010